Skip to content

Conversation

@AlSchlo
Copy link
Contributor

@AlSchlo AlSchlo commented Oct 20, 2025

This PR introduces Panorama into HNSWFlat, following our paper. Panorama achieves up to 4× lower latency on higher-dimensional data, making it a great option for medium-sized datasets that don't benefit much from quantization.

Below are some benchmarks on SIFT-128, GIST-960, and synthetic 2048-dimensional data. I recommend checking out the paper for more results. As expected, Panorama is not a silver bullet when combined with HNSW—it’s only worthwhile for high-dimensional data.

It might be worth considering, in the future, adding a function that dynamically sets the number of levels. However, this would require reorganizing the cumulative sums.

SIFT-128

Note: SIFT-128 performs slightly worse here than in our paper because we use 8 levels, whereas the paper explored several level configurations. Eight levels introduce quite a bit of overhead for 128-dimensional data, but I kept it consistent across all benchmarks for comparison.

bench_hnsw_flat_panorama_SIFT1M

GIST-960

bench_hnsw_flat_panorama_GIST1M

Synthetic-2048

bench_hnsw_flat_panorama_Synthetic2048D

@meta-cla meta-cla bot added the CLA Signed label Oct 20, 2025
Copy link
Contributor

@mdouze mdouze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.
About Panorama in general: would it be feasible to make an IndexRefine that supports FlatPanorma as a refinement index?
The reason is because it may be more efficient to do all the non-exhaustive searches in low dimension and refine the result list in the end.
This would also make it possible to apply panorama to low-accuracy & fast indexes like FastScan and RabitQ indexes.

@mdouze
Copy link
Contributor

mdouze commented Oct 20, 2025

Please share any performance comparison you have with this code vs. the HNSWFlat implementation.
Since the data is not contiguous, the performance profile could be different from an IVFFlat index.

@AlSchlo
Copy link
Contributor Author

AlSchlo commented Oct 20, 2025

@mdouze thanks for the review

  1. Yes we can use it in IndexRefine, this is a good idea. I would assume that IndexRefine does not have its vector sequential in memory by design? If no, this is OK but sub-optimal, as the gains of Panorama are more modest in the presence of all those cache misses. We cover this in the paper.

  2. Performance of HNSW is benched in the paper too, it's still worth it on higher dimensional data, but much more ad-hoc than IndexIVFFlatPanorama. We will include some benches with this new cleaned up code. Here is the graph from the paper.

image
  1. Panorama can work on IVFPQ (including FastScan), but integration here is a bigger effort (to support all AVX targets, etc.) as we have to interleave the codes to keep the SIMD lanes busy. In fact, this is where we have the best performance speedups.

@alexanderguzhva
Copy link
Contributor

@AlSchlo is it worth allowing configuring a default (UB + LB) / 2 behavior by allowing, say, other options like just LB?

@AlSchlo
Copy link
Contributor Author

AlSchlo commented Oct 20, 2025

@alexanderguzhva excellent suggestion! so we actually used to have an epsilon knob there, but we ended up not talking about it in the paper. It's a knob that just adds confusion IMO and makes the workload more unpredictable.

We did not study it in more detail as the paper was getting too dense.

@mnorris11
Copy link

Sorry for the delay on these PRs, I'm still conducting some benchmarking.

@aknayar
Copy link
Contributor

aknayar commented Nov 7, 2025

@mnorris11 No worries, and thank you so much for the reviews! As an update, after #4645 is confirmed, I have a local build of IndexRefinePanorama ready to submit with really promising results (2x E2E speedups on GIST with IVF256,PQ60x4fs as the base index vs. L2Flat as the refine index—seen below). I think speedups of 3x and above could be expected from more amenable datasets (OpenAI's DBpedia-Large, etc.).

image

@mnorris11
Copy link

The benchmarks look good on my end!

Regarding the get_query() on the FlatCodesDistanceComputer: should we just add a const float* q at the FlatCodesDistanceComputer level and remove the const float* query and const float* q from derived structs? I know not all of them use it like RaBitDistanceComputer, but it can just remain nullptr for that struct. And then can we remove the get_query()?

@AlSchlo
Copy link
Contributor Author

AlSchlo commented Nov 14, 2025

Hi @mnorris11 ,

To clarify: are you suggesting to make get_query() virtual?
If so, yes this is possible. I hold no strong opinions on this matter.

@mnorris11
Copy link

mnorris11 commented Nov 14, 2025

Hi @mnorris11 ,

To clarify: are you suggesting to make get_query() virtual? If so, yes this is possible. I hold no strong opinions on this matter.

@AlSchlo I mean, is it possible to just have a q member on the parent struct rather than having this new get_query()? Then when we need to access, we can do const float* query = flat_codes_qdis->q; in HNSW.cpp?

How about something like this patch file below? Feel free to note any cons or pitfalls that I'm missing.

cc @mdouze if you have other opinions. IMO it is only a small style thing. After we decide one way or another, this one should be good to merge?

From 975e0a2868f5f2aa30f0aa62e228876ab3f37bfd Mon Sep 17 00:00:00 2001
From: Michael Norris <[email protected]>
Date: Fri, 14 Nov 2025 12:27:54 -0800
Subject: [PATCH] test to show not having get_query (#4677)

Summary: Pull Request resolved: https://github.com/facebookresearch/faiss/pull/4677

Differential Revision: D87097189


---

diff --git a/faiss/IndexAdditiveQuantizer.cpp b/faiss/IndexAdditiveQuantizer.cpp
--- a/faiss/IndexAdditiveQuantizer.cpp
+++ b/faiss/IndexAdditiveQuantizer.cpp
@@ -58,10 +58,6 @@
         q = x;
     }
 
-    const float* get_query() const override {
-        return q;
-    }
-
     float symmetric_dis(idx_t i, idx_t j) final {
         aq.decode(codes + i * d, tmp.data(), 1);
         aq.decode(codes + j * d, tmp.data() + d, 1);
@@ -81,14 +77,12 @@
     std::vector<float> LUT;
     const AdditiveQuantizer& aq;
     size_t d;
-    const float* q;
 
     explicit AQDistanceComputerLUT(const IndexAdditiveQuantizer& iaq)
             : FlatCodesDistanceComputer(iaq.codes.data(), iaq.code_size),
               LUT(iaq.aq->total_codebook_size + iaq.d * 2),
               aq(*iaq.aq),
-              d(iaq.d),
-              q(nullptr) {}
+              d(iaq.d) {}
 
     float bias;
     void set_query(const float* x) final {
@@ -102,10 +96,6 @@
         }
     }
 
-    const float* get_query() const override {
-        return q;
-    }
-
     float symmetric_dis(idx_t i, idx_t j) final {
         float* tmp = LUT.data();
         aq.decode(codes + i * d, tmp, 1);
diff --git a/faiss/IndexFlat.cpp b/faiss/IndexFlat.cpp
--- a/faiss/IndexFlat.cpp
+++ b/faiss/IndexFlat.cpp
@@ -100,7 +100,6 @@
 struct FlatL2Dis : FlatCodesDistanceComputer {
     size_t d;
     idx_t nb;
-    const float* q;
     const float* b;
     size_t ndis;
     size_t npartial_dot_products;
@@ -126,10 +125,10 @@
     explicit FlatL2Dis(const IndexFlat& storage, const float* q = nullptr)
             : FlatCodesDistanceComputer(
                       storage.codes.data(),
-                      storage.code_size),
+                      storage.code_size,
+                      q),
               d(storage.d),
               nb(storage.ntotal),
-              q(q),
               b(storage.get_xb()),
               ndis(0),
               npartial_dot_products(0) {}
@@ -138,10 +137,6 @@
         q = x;
     }
 
-    const float* get_query() const override {
-        return q;
-    }
-
     // compute four distances
     void distances_batch_4(
             const idx_t idx0,
@@ -250,10 +245,6 @@
         q = x;
     }
 
-    const float* get_query() const override {
-        return q;
-    }
-
     // compute four distances
     void distances_batch_4(
             const idx_t idx0,
@@ -378,10 +369,6 @@
         query_l2norm = fvec_norm_L2sqr(q, d);
     }
 
-    const float* get_query() const override {
-        return q;
-    }
-
     // compute four distances
     void distances_batch_4(
             const idx_t idx0,
diff --git a/faiss/IndexFlatCodes.cpp b/faiss/IndexFlatCodes.cpp
--- a/faiss/IndexFlatCodes.cpp
+++ b/faiss/IndexFlatCodes.cpp
@@ -142,10 +142,6 @@
         query = x;
     }
 
-    const float* get_query() const override {
-        return query;
-    }
-
     float operator()(idx_t i) override {
         codec.sa_decode(1, codes + i * code_size, vec_buffer.data());
         return vd(query, vec_buffer.data());
diff --git a/faiss/IndexPQ.cpp b/faiss/IndexPQ.cpp
--- a/faiss/IndexPQ.cpp
+++ b/faiss/IndexPQ.cpp
@@ -132,10 +132,6 @@
             pq.compute_inner_prod_table(x, precomputed_table.data());
         }
     }
-
-    const float* get_query() const override {
-        return q;
-    }
 };
 
 } // namespace
diff --git a/faiss/impl/DistanceComputer.h b/faiss/impl/DistanceComputer.h
--- a/faiss/impl/DistanceComputer.h
+++ b/faiss/impl/DistanceComputer.h
@@ -113,13 +113,16 @@
     const uint8_t* codes;
     size_t code_size;
 
-    /// Returns a pointer to the currently active query vector.
-    /// Only present for FlatCodesDistanceComputer subclasses
-    /// out of need for Panorama to compute the squared norm of the query.
-    virtual const float* get_query() const = 0;
+    const float* q = nullptr; // not used in all distance computers
 
-    FlatCodesDistanceComputer(const uint8_t* codes, size_t code_size)
-            : codes(codes), code_size(code_size) {}
+    FlatCodesDistanceComputer(
+            const uint8_t* codes,
+            size_t code_size,
+            const float* q = nullptr)
+            : codes(codes), code_size(code_size), q(q) {}
+
+    FlatCodesDistanceComputer(const float* q)
+            : codes(nullptr), code_size(0), q(q) {}
 
     FlatCodesDistanceComputer() : codes(nullptr), code_size(0) {}
 
diff --git a/faiss/impl/HNSW.cpp b/faiss/impl/HNSW.cpp
--- a/faiss/impl/HNSW.cpp
+++ b/faiss/impl/HNSW.cpp
@@ -801,7 +801,7 @@
     std::vector<idx_t> index_array(M);
     std::vector<float> exact_distances(M);
 
-    const float* query = flat_codes_qdis->get_query();
+    const float* query = flat_codes_qdis->q;
     std::vector<float> query_cum_sums(panorama_index->num_panorama_levels + 1);
     IndexHNSWFlatPanorama::compute_cum_sums(
             query,
diff --git a/faiss/impl/RaBitQuantizer.cpp b/faiss/impl/RaBitQuantizer.cpp
--- a/faiss/impl/RaBitQuantizer.cpp
+++ b/faiss/impl/RaBitQuantizer.cpp
@@ -133,14 +133,8 @@
     // the metric
     MetricType metric_type = MetricType::METRIC_L2;
 
-    const float* q = nullptr;
-
     RaBitDistanceComputer();
 
-    const float* get_query() const override {
-        return q;
-    }
-
     float symmetric_dis(idx_t i, idx_t j) override;
 };
 
diff --git a/faiss/impl/ScalarQuantizer.h b/faiss/impl/ScalarQuantizer.h
--- a/faiss/impl/ScalarQuantizer.h
+++ b/faiss/impl/ScalarQuantizer.h
@@ -98,13 +98,7 @@
     SQuantizer* select_quantizer() const;
 
     struct SQDistanceComputer : FlatCodesDistanceComputer {
-        const float* q;
-
-        SQDistanceComputer() : q(nullptr) {}
-
-        const float* get_query() const override {
-            return q;
-        }
+        SQDistanceComputer() : FlatCodesDistanceComputer(nullptr) {}
 
         virtual float query_to_code(const uint8_t* code) const = 0;
 
diff --git a/faiss/utils/extra_distances.cpp b/faiss/utils/extra_distances.cpp
--- a/faiss/utils/extra_distances.cpp
+++ b/faiss/utils/extra_distances.cpp
@@ -126,10 +126,6 @@
     void set_query(const float* x) override {
         q = x;
     }
-
-    const float* get_query() const override {
-        return q;
-    }
 };
 
 struct Run_get_distance_computer {

--

@AlSchlo
Copy link
Contributor Author

AlSchlo commented Nov 14, 2025

@mnorris11 That's a good suggestion, makes it a bit cleaner and less invasive.

idx_t idx = index_array[i];
if (!sel || sel->is_member(idx)) {
if (res.add_result(exact_distances[i], idx)) {
threshold = res.threshold;
Copy link

@mnorris11 mnorris11 Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like threshold is not used after being written to here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I did this to better mirror the original code. Please check my latest commit, where I made it a bit more idiomatic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just realized this has a little bug which I will fix, condition seems wrong)

Copy link
Contributor Author

@AlSchlo AlSchlo Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and wrote a test for it! Please check out :) @mnorris11

@mnorris11
Copy link

Sorry to mention it after so long: I realized the test is cpp, not Python. Usually we prefer Python tests if possible, to exercise the SWIG layer. Can this be converted?

Otherwise, tests are all passing internally and benchmarks look good.

I will add Panorama for various indexes to the "Guidesline to choose an index" and "Index Factory" wikis after everything is merged.

@AlSchlo
Copy link
Contributor Author

AlSchlo commented Nov 18, 2025

Ah, I matched test_hnsw.cpp, let me see what I can do!

@AlSchlo
Copy link
Contributor Author

AlSchlo commented Nov 18, 2025

Done. I translated it with Claude and then reviewed the output manually. It seems quite accurate to me. @mnorris11

@meta-codesync
Copy link
Contributor

meta-codesync bot commented Nov 19, 2025

@mnorris11 merged this pull request in 9080fdb.

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by c36cd39.

AlSchlo added a commit to AlSchlo/faiss-panorama that referenced this pull request Nov 20, 2025
Summary:
This PR introduces **Panorama** into `HNSWFlat`, following our [paper](https://arxiv.org/pdf/2510.00566). Panorama achieves up to **4× lower latency** on higher-dimensional data, making it a great option for medium-sized datasets that don't benefit much from quantization.

Below are some benchmarks on **SIFT-128**, **GIST-960**, and **synthetic 2048-dimensional** data. I recommend checking out the paper for more results. As expected, Panorama is not a silver bullet when combined with HNSW—it’s only worthwhile for high-dimensional data.

It might be worth considering, in the future, adding a function that dynamically sets the number of levels. However, this would require reorganizing the cumulative sums.

## SIFT-128

**Note:** SIFT-128 performs slightly worse here than in our paper because we use 8 levels, whereas the paper explored several level configurations. Eight levels introduce quite a bit of overhead for 128-dimensional data, but I kept it consistent across all benchmarks for comparison.

<img width="788" height="435" alt="bench_hnsw_flat_panorama_SIFT1M" src="https://github.com/user-attachments/assets/5004dee4-de0a-44e5-9031-582f7738a348" />

## GIST-960
<img width="787" height="435" alt="bench_hnsw_flat_panorama_GIST1M" src="https://github.com/user-attachments/assets/ba73d062-be14-44b0-9c46-ac1dafbfaed2" />

## Synthetic-2048
<img width="794" height="435" alt="bench_hnsw_flat_panorama_Synthetic2048D" src="https://github.com/user-attachments/assets/acf6233c-185b-4295-9c63-7a4b5b037619" />

Pull Request resolved: facebookresearch#4621

Reviewed By: mdouze

Differential Revision: D85902427

Pulled By: mnorris11

fbshipit-source-id: 4db9e950ce0c532494fa99ae93d39ccf06779b5d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants